Fine-tuning your XGBoost model

python

datacamp

hyperparameters

machine learning

XGBoost

Author

kakamana

Published

January 21, 2023

Fine-tuning your XGBoost model

You will learn how to adjust XGBoost’s parameters and how to tune them efficiently so that you can supercharge the performance of your models

This Fine-tuning your XGBoost model is part of Datacamp course: Extreme Gradient Boosting with XGBoost

This is my learning experience of data science through DataCamp

Code

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import xgboost as xgb

plt.rcParams['figure.figsize'] = (10, 10)

Why tune your model?

Now that you’ve seen the effect that tuning has on the overall performance of your XGBoost model, let’s turn the question on its head and see if you can figure out when tuning your model might not be the best idea.

Tuning the number of boosting rounds

Let’s start with parameter tuning by seeing how the number of boosting rounds (number of trees you build) impacts the out-of-sample performance of your XGBoost model. You’ll use xgb.cv() inside a for loop and build one model per num_boost_round parameter.

Here, you’ll continue working with the Ames housing dataset. The features are available in the array X, and the target vector is contained in y.

Code

df = pd.read_csv('dataset/ames_housing_trimmed_processed.csv')
X, y = df.iloc[:, :-1], df.iloc[:, -1]

Code

housing_dmatrix = xgb.DMatrix(data=X, label=y)

# Creata the parameter dictionary for each tree: params
params = {"objective":"reg:squarederror", "max_depth":3}

# Create list of number of boosting rounds
num_rounds = [5, 10, 15]

# Empty list to store final round rmse per XGBoost model
final_rmse_per_round = []

# Interate over num_rounds and build one model per num_boost_round parameter
for curr_num_rounds in num_rounds:
    # Perform cross-validation: cv_results
    cv_results = xgb.cv(dtrain=housing_dmatrix, params=params, nfold=3,
                        num_boost_round=curr_num_rounds, metrics='rmse',
                        as_pandas=True, seed=123)

    # Append final round RMSE
    final_rmse_per_round.append(cv_results['test-rmse-mean'].tail().values[-1])

# Print the result DataFrame
num_rounds_rmses = list(zip(num_rounds, final_rmse_per_round))
print(pd.DataFrame(num_rounds_rmses, columns=['num_boosting_rounds', 'rmse']))
print("\nAs you can see, increasing the number of boosting rounds decreases the RMSE.")

   num_boosting_rounds          rmse
0                    5  50903.299752
1                   10  34774.194090
2                   15  32895.099185

As you can see, increasing the number of boosting rounds decreases the RMSE.

Automated boosting round selection using early_stopping

Now, instead of attempting to cherry pick the best possible number of boosting rounds, you can very easily have XGBoost automatically select the number of boosting rounds for you within xgb.cv(). This is done using a technique called early stopping.

Early stopping works by testing the XGBoost model after every boosting round against a hold-out dataset and stopping the creation of additional boosting rounds (thereby finishing training of the model early) if the hold-out metric ("rmse" in our case) does not improve for a given number of rounds. Here you will use the early_stopping_rounds parameter in xgb.cv() with a large possible number of boosting rounds (50). Bear in mind that if the holdout metric continuously improves up through when num_boost_rounds is reached, then early stopping does not occur.

Code

# Create your housing DMatrix: housing_dmatrix
housing_dmatrix = xgb.DMatrix(data=X, label=y)

# Create the parameter dictionary for each tree: params
params = {"objective":"reg:squarederror", "max_depth":4}

# Perform cross-validation with early-stopping: cv_results
cv_results = xgb.cv(dtrain=housing_dmatrix, nfold=3, params=params, metrics="rmse",
                    early_stopping_rounds=10, num_boost_round=50, as_pandas=True, seed=123)

# Print cv_results
print(cv_results)

    train-rmse-mean  train-rmse-std  test-rmse-mean  test-rmse-std
0     141871.635216      403.633062   142640.653507     705.559723
1     103057.033818       73.768079   104907.664683     111.117033
2      75975.967655      253.727043    79262.056654     563.766693
3      57420.530642      521.658273    61620.137859    1087.693428
4      44552.956483      544.170426    50437.560906    1846.446643
5      35763.948865      681.796675    43035.659539    2034.471115
6      29861.464164      769.571418    38600.880800    2169.796804
7      25994.675122      756.520639    36071.817710    2109.795408
8      23306.836299      759.237848    34383.186387    1934.547433
9      21459.770256      745.624640    33509.140338    1887.375358
10     20148.721060      749.612186    32916.806725    1850.893437
11     19215.382607      641.387200    32197.833474    1734.456654
12     18627.388962      716.256240    31770.852340    1802.154296
13     17960.695080      557.043324    31482.782172    1779.124406
14     17559.736640      631.413137    31389.990252    1892.320326
15     17205.713357      590.171774    31302.883291    1955.165882
16     16876.571801      703.631953    31234.058914    1880.706205
17     16597.662170      703.677363    31318.347820    1828.860754
18     16330.460661      607.274258    31323.634893    1775.909992
19     16005.972387      520.470815    31204.135450    1739.076237
20     15814.300847      518.604822    31089.863868    1756.022175
21     15493.405856      505.616461    31047.997697    1624.673447
22     15270.734205      502.018639    31056.916210    1668.043691
23     15086.381896      503.913078    31024.984403    1548.985086
24     14917.608289      486.206137    30983.685376    1663.131135
25     14709.589477      449.668262    30989.476981    1686.667218
26     14457.286251      376.787759    30952.113767    1613.172390
27     14185.567149      383.102597    31066.901381    1648.534545
28     13934.066721      473.465580    31095.641882    1709.225578
29     13749.644941      473.670743    31103.886799    1778.879849
30     13549.836644      454.898742    30976.084872    1744.514518
31     13413.484678      399.603422    30938.469354    1746.053330
32     13275.915700      415.408595    30931.000055    1772.469405
33     13085.878211      493.792795    30929.056846    1765.541040
34     12947.181279      517.790033    30890.629160    1786.510472
35     12846.027264      547.732747    30884.493051    1769.728787
36     12702.378727      505.523140    30833.542124    1691.002007
37     12532.244170      508.298300    30856.688154    1771.445485
38     12384.055037      536.224929    30818.016568    1782.785175
39     12198.443769      545.165604    30839.393263    1847.326671
40     12054.583621      508.841802    30776.965294    1912.780332
41     11897.036784      477.177932    30794.702627    1919.675130
42     11756.221708      502.992363    30780.956160    1906.820178
43     11618.846752      519.837483    30783.754746    1951.260120
44     11484.080227      578.428500    30776.731276    1953.447810
45     11356.552654      565.368946    30758.543732    1947.454939
46     11193.557745      552.298986    30729.971937    1985.699239
47     11071.315547      604.090125    30732.663173    1966.997252
48     10950.778492      574.862853    30712.241251    1957.750615
49     10824.865446      576.665678    30720.853939    1950.511037

Overview of XGBoost’s hyperparameters

Common tree tunable parameters
- learning rate: learning rate/eta
- gamma: min loss reduction to create new tree split
- lambda: L2 regularization on leaf weights
- alpha: L1 regularization on leaf weights
- max_depth: max depth per tree
- subsample: % samples used per tree
- colsample_bytree: % features used per tree
Linear tunable parameters
- lambda: L2 reg on weights
- alpha: L1 reg on weights
- lambda_bias: L2 reg term on bias
You can also tune the number of estimators used for both base model types!

Tuning eta

It’s time to practice tuning other XGBoost hyperparameters in earnest and observing their effect on model performance! You’ll begin by tuning the "eta", also known as the learning rate.

The learning rate in XGBoost is a parameter that can range between 0 and 1, with higher values of "eta" penalizing feature weights more strongly, causing much stronger regularization.

Code

# Create your housing DMatrix: housing_dmatrix
housing_dmatrix = xgb.DMatrix(data=X, label=y)

# Create the parameter dictionary for each tree (boosting round)
params = {"objective":"reg:squarederror", "max_depth":3}

# Create list of eta values and empty list to store final round rmse per xgboost model
eta_vals = [0.001, 0.01, 0.1]
best_rmse = []

# Systematicallyvary the eta
for curr_val in eta_vals:
    params['eta'] = curr_val

    # Perform cross-validation: cv_results
    cv_results = xgb.cv(dtrain=housing_dmatrix, params=params, nfold=3,
                        early_stopping_rounds=5, num_boost_round=10, metrics='rmse', seed=123,
                       as_pandas=True)

    # Append the final round rmse to best_rmse
    best_rmse.append(cv_results['test-rmse-mean'].tail().values[-1])

# Print the result DataFrame
print(pd.DataFrame(list(zip(eta_vals, best_rmse)), columns=['eta', 'best_rmse']))

     eta      best_rmse
0  0.001  195736.402543
1  0.010  179932.183986
2  0.100   79759.411808

Tuning max_depth

In this exercise, your job is to tune max_depth, which is the parameter that dictates the maximum depth that each tree in a boosting round can grow to. Smaller values will lead to shallower trees, and larger values to deeper trees.

Code

# Create your housing DMatrix
housing_dmatrix = xgb.DMatrix(data=X, label=y)

# Create the parameter dictionary
params = {"objective":"reg:squarederror"}

# Create list of max_depth values
max_depths = [2, 5, 10, 20]
best_rmse = []

for curr_val in max_depths:
    params['max_depth'] = curr_val

    # Perform cross-validation
    cv_results = xgb.cv(dtrain=housing_dmatrix, params=params, nfold=2,
                       early_stopping_rounds=5, num_boost_round=10, metrics='rmse', seed=123,
                        as_pandas=True)

    # Append the final round rmse to best_rmse
    best_rmse.append(cv_results['test-rmse-mean'].tail().values[-1])

# Print the result DataFrame
print(pd.DataFrame(list(zip(max_depths, best_rmse)), columns=['max_depth', 'best_rmse']))

   max_depth     best_rmse
0          2  37957.469464
1          5  35596.599504
2         10  36065.547345
3         20  36739.576068

Tuning colsample_bytree

Now, it’s time to tune "colsample_bytree". You’ve already seen this if you’ve ever worked with scikit-learn’s RandomForestClassifier or RandomForestRegressor, where it just was called max_features. In both xgboost and sklearn, this parameter (although named differently) simply specifies the fraction of features to choose from at every split in a given tree. In xgboost, colsample_bytree must be specified as a float between 0 and 1.

Code

# Create your housing DMatrix
housing_dmatrix = xgb.DMatrix(data=X,label=y)

# Create the parameter dictionary
params={"objective":"reg:squarederror", "max_depth":3}

# Create list of hyperparameter values: colsample_bytree_vals
colsample_bytree_vals = [0.1, 0.5, 0.8, 1]
best_rmse = []

# Systematically vary the hyperparameter value
for curr_val in colsample_bytree_vals:
    params['colsample_bytree'] = curr_val

    # Perform cross-validation
    cv_results = xgb.cv(dtrain=housing_dmatrix, params=params, nfold=2,
                 num_boost_round=10, early_stopping_rounds=5,
                 metrics="rmse", as_pandas=True, seed=123)

    # Append the final round rmse to best_rmse
    best_rmse.append(cv_results["test-rmse-mean"].tail().values[-1])

# Print the resultant DataFrame
print(pd.DataFrame(list(zip(colsample_bytree_vals, best_rmse)),
                   columns=["colsample_bytree","best_rmse"]))
print("\nThere are several other individual parameters that you can tune, such as `'subsample'`, which dictates the fraction of the training data that is used during any given boosting round. Next up: Grid Search and Random Search to tune XGBoost hyperparameters more efficiently!")

   colsample_bytree     best_rmse
0               0.1  40918.116895
1               0.5  35813.904168
2               0.8  35995.678734
3               1.0  35836.044343

There are several other individual parameters that you can tune, such as `'subsample'`, which dictates the fraction of the training data that is used during any given boosting round. Next up: Grid Search and Random Search to tune XGBoost hyperparameters more efficiently!

Review of grid search and random search

Grid search: review
- Search exhaustively over a given set of hyperparameters, once per set of hyperparameters
- Number of models = number of distinct values per hyperparameter multiplied across each hyperparameter
- Pick final model hyperparameter values that give best cross-validated evaluation metric value
Random search: review
- Create a (possibly infinte) range of hyperparameter values per hyperparameter that you would like to search over
- Set the number of iterations you would like for the random search to continue
- During each iteration, randomly draw a value in the range of specified values for each hyperparameter searched over and train/evaluate a model with those hyperparameters
- After you’ve reached the maximum number of iterations, select the hyperparameter configuration with the best evaluated score

Grid search with XGBoost

Now that you’ve learned how to tune parameters individually with XGBoost, let’s take your parameter tuning to the next level by using scikit-learn’s GridSearch and RandomizedSearch capabilities with internal cross-validation using the GridSearchCV and RandomizedSearchCV functions. You will use these to find the best model exhaustively from a collection of possible parameter values across multiple parameters simultaneously. Let’s get to work, starting with GridSearchCV!

Code

from sklearn.model_selection import GridSearchCV

# Create the parameter grid: gbm_param_grid
gbm_param_grid = {
    'colsample_bytree': [0.3, 0.7],
    'n_estimators': [50],
    'max_depth': [2, 5]
}

# Instantiate the regressor: gbm
gbm = xgb.XGBRegressor()

# Perform grid search: grid_mse
grid_mse = GridSearchCV(param_grid=gbm_param_grid, estimator=gbm,
                        scoring='neg_mean_squared_error', cv=4, verbose=1)

# Fit grid_mse to the data
grid_mse.fit(X, y)

# Print the best parameters and lowest RMSE
print("Best parameters found: ", grid_mse.best_params_)
print("Lowest RMSE found: ", np.sqrt(np.abs(grid_mse.best_score_)))

Fitting 4 folds for each of 4 candidates, totalling 16 fits
Best parameters found:  {'colsample_bytree': 0.3, 'max_depth': 5, 'n_estimators': 50}
Lowest RMSE found:  28986.18703093561

Random search with XGBoost

Often, GridSearchCV can be really time consuming, so in practice, you may want to use RandomizedSearchCV instead, as you will do in this exercise. The good news is you only have to make a few modifications to your GridSearchCV code to do RandomizedSearchCV. The key difference is you have to specify a param_distributions parameter instead of a param_grid parameter.

Code

from sklearn.model_selection import RandomizedSearchCV

# Create the parameter grid: gbm_param_grid
gbm_param_grid = {
    'n_estimators': [25],
    'max_depth': range(2, 12)
}

# Instantiate the regressor: gbm
gbm = xgb.XGBRegressor(n_estimators=10)

# Perform random search: randomized_mse
randomized_mse = RandomizedSearchCV(param_distributions=gbm_param_grid, estimator=gbm,
                                    scoring='neg_mean_squared_error', n_iter=5, cv=4,
                                   verbose=1)

# Fit randomized_mse to the data
randomized_mse.fit(X, y)

# Print the best parameters and lowest RMSE
print("Best parameters found: ", randomized_mse.best_params_)
print("Lowest RMSE found: ", np.sqrt(np.abs(randomized_mse.best_score_)))

Fitting 4 folds for each of 5 candidates, totalling 20 fits
Best parameters found:  {'n_estimators': 25, 'max_depth': 4}
Lowest RMSE found:  29998.4522530019

Limits of grid search and random search

limitations
- Grid Search
  - Number of models you must build with every additionary new parameter grows very quickly
- Random Search
  - Parameter space to explore can be massive
  - Randomly jumping throughtout the space looking for a “best” results becomes a waiting game